Maximum Entropy Tagging with Binary and Real-Valued Features

نویسندگان

Vanessa Sandrini

Marcello Federico

Mauro Cettolo

چکیده

Recent literature on text-tagging reported successful results by applying Maximum Entropy (ME) models. In general, ME taggers rely on carefully selected binary features, which try to capture discriminant information from the training data. This paper introduces a standard setting of binary features, inspired by the literature on named-entity recognition and text chunking, and derives corresponding realvalued features based on smoothed logprobabilities. The resulting ME models have orders of magnitude fewer parameters. Effective use of training data to estimate features and parameters is achieved by integrating a leaving-one-out method into the standard ME training algorithm. Experimental results on two tagging tasks show statistically significant performance gains after augmenting standard binaryfeature models with real-valued features.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Structured Information in Natural Language Applications

متن کامل

Tagging Unknown Words with Raw Text Features

Processing unknown words is disproportionately important because of their high information content. It is crucial in domains with specialist vocabularies where relevant training material is scarce, for example: biological text. Unknown word processing often begins with Part of Speech (POS) tagging, where accuracy is typically 10% worse than on known words. We demonstrate that features extracted...

متن کامل

Using PCA with LVQ, RBF, MLP, SOM and Continuous Wavelet Transform for Fault Diagnosis of Gearboxes

A new method based on principal component analysis (PCA) and artificial neural networks (ANN) is proposed for fault diagnosis of gearboxes. Firstly the six different base wavelets are considered, in which three are from real valued and other three from complex valued. Two wavelet selection criteria Maximum Energy to Shannon Entropy ratio and Maximum Relative Wavelet Energy are used and compared...

متن کامل

A Weighted Maximum Entropy Language Model for Text Classification

The Maximum entropy (ME) approach has been extensively used in various Natural Language Processing tasks, such as language modeling, partof-speech tagging, text classification and text segmentation. Previous work in text classification was conducted using maximum entropy modeling with binary-valued features or counts of feature words. In this work, we present a method for applying Maximum Entro...

متن کامل

NING MA et al: FUSION OF WORD CLUSTERING FEATURES FOR TIBETAN PART OF SPEECH TAGGING

Tibetan Part of Speech (POS) tagging, the foundation of Tibetan natural language processing, judges word classification according to contextual information of words. Based on the framework of the maximum entropy model, the paper studied the fusion of morphological features for Tibetan part of speech with maximum entropy model with the integration of word clustering features. Experimental result...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2006

Maximum Entropy Tagging with Binary and Real-Valued Features

نویسندگان

چکیده

منابع مشابه

Learning Structured Information in Natural Language Applications

Tagging Unknown Words with Raw Text Features

Using PCA with LVQ, RBF, MLP, SOM and Continuous Wavelet Transform for Fault Diagnosis of Gearboxes

A Weighted Maximum Entropy Language Model for Text Classification

NING MA et al: FUSION OF WORD CLUSTERING FEATURES FOR TIBETAN PART OF SPEECH TAGGING

عنوان ژورنال:

اشتراک گذاری